2 research outputs found

    Transfer Learning for Speech Recognition on a Budget

    Full text link
    End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring less training data in order to achieve the same accuracy, thereby lowering the cost of training ASR models in other languages. Model introspection revealed that small adaptations to the network's weights were sufficient for good performance, especially for inner layers.Comment: Accepted for 2nd ACL Workshop on Representation Learning for NL

    Analyzing and Visualizing Deep Neural Networks for Speech Recognition with Saliency-Adjusted Neuron Activation Profiles

    No full text
    Deep Learning-based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain a better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, several introspection methods have been proposed. However, established introspection techniques are mostly designed for computer vision tasks and rely on the data being visually interpretable, which limits their usefulness for understanding speech recognition models. To overcome this limitation, we developed a novel neuroscience-inspired technique for visualizing and understanding ANNs, called Saliency-Adjusted Neuron Activation Profiles (SNAPs). SNAPs are a flexible framework to analyze and visualize Deep Neural Networks that does not depend on visually interpretable data. In this work, we demonstrate how to utilize SNAPs for understanding fully-convolutional ASR models. This includes visualizing acoustic concepts learned by the model and the comparative analysis of their representations in the model layers
    corecore